NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

rTisane: Externalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modeling

https://doi.org/10.1145/3613904.3642267

Jun, Eunice; Misback, Edward; Heer, Jeffrey; Just, Rene (May 2024, ACM)

Full Text Available
Understanding and Supporting Debugging Workflows in Multiverse Analysis

https://doi.org/10.1145/3544548.3581099

Gu, Ken; Jun, Eunice; Althoff, Tim (April 2023, CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems)

Multiverse analysis—a paradigm for statistical analysis that considers all combinations of reasonable analysis choices in parallel—promises to improve transparency and reproducibility. Although recent tools help analysts specify multiverse analyses, they remain difficult to use in practice. In this work, we identify debugging as a key barrier due to the latency from running analyses to detecting bugs and the scale of metadata processing needed to diagnose a bug. To address these challenges, we prototype a command-line interface tool, Multiverse Debugger, which helps diagnose bugs in the multiverse and propagate fixes. In a qualitative lab study (n=13), we use Multiverse Debugger as a probe to develop a model of debugging workflows and identify specific challenges, including difficulty in understanding the multiverse’s composition. We conclude with design implications for future multiverse analysis authoring systems.
more » « less
Full Text Available
Quilt: Custom UIs for Linking Unstructured Documents to Structured Datasets

https://doi.org/10.1145/3672539.3686777

Kallanagoudar, Pragya; Anand, Chithra; Garcia, Rolando; Hicke, Rebecca_M M; Parameswaran, Aditya; Jun, Eunice; Chasins, Sarah E (October 2024, ACM)

Full Text Available
Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

https://doi.org/10.1145/3491102.3501888

Jun, Eunice; Seo, Audrey; Heer, Jeffrey; Just, René (April 2022, CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems)

Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes.
more » « less
Full Text Available
Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications

https://doi.org/10.1145/3476980

Jun, Eunice; Birchfield, Melissa; De Moura, Nicole; Heer, Jeffrey; Just, René (February 2022, ACM Transactions on Computer-Human Interaction)

Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analysis of 50 research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analyses to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation and discuss implications for future tools.
more » « less
Full Text Available
Latent Space Cartography: Visual Analysis of Vector Space Embeddings

https://doi.org/10.1111/cgf.13672

Liu, Yang; Jun, Eunice; Li, Qisheng; Heer, Jeffrey (June 2019, Computer Graphics Forum)

Full Text Available
Tea: A High-level Language and Runtime System for Automating Statistical Analysis

https://doi.org/10.1145/3332165.3347940

Jun, Eunice; Daum, Maureen; Roesch, Jared; Chasins, Sarah; Berger, Emery; Just, Rene; Reinecke, Katharina (October 2019, User Interface Software & Technology)

Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.
more » « less
Full Text Available

Search for: All records